Goto

Collaborating Authors

 projection operator


Beyond Least Squares: Uniform Approximation and the Hidden Cost of Misspecification

Neural Information Processing Systems

We study the problem of controlling worst-case errors in misspecified linear regression under the random design setting, where the regression function is estimated via (penalized) least-squares. This setting arises naturally in value function approximation for bandit algorithms and reinforcement learning (RL). Our first main contribution is the observation that the amplification of the misspecification error when using least-squares is governed by the Lebesgue constant, a classical quantity from approximation theory that depends on the choice of the feature subspace and the covariate distribution. We also show that this dependence on the misspecification error is tight for least-squares regression: in general, no method minimizing the empirical squared loss, including regularized least-squares, can improve it substantially. We argue this explains the empirical observation that some feature-maps (e.g., those derived from the Fourier bases) "work better in RL" than others (e.g., polynomials): given some covariate distribution, the Lebesgue constant is known to be highly sensitive to choice of the feature-map. As a second contribution, we propose a method that augments the original feature set with auxiliary features designed to reduce the error amplification. We then prove that the method successfully competes with an "oracle" that knows the best way of using the auxiliary features to reduce this amplification. For example, when the domain is a real interval and the features are monomials, our method reduces the amplification factor to O(1)as d, while without our method, least-squares with the monomials (and in fact polynomials) will suffer a worst-case error amplification of order Ω(d). It follows that there are functions and feature maps for which our method is consistent, while least-squares is inconsistent.


Constrained Discrete Diffusion

Neural Information Processing Systems

Discrete diffusion models are a class of generative models that construct sequences by progressively denoising samples from a categorical noise distribution. Beyond their rapidly growing ability to generate coherent natural language, these models present a new and important opportunity to enforce sequence-level constraints, a capability that current autoregressive models cannot natively provide. This paper capitalizes on this opportunity by introducing Constrained Discrete Diffusion (CDD), a novel integration of differentiable constraint optimization within the diffusion process to ensure adherence to constraints, logic rules, or safety requirements for generated sequences. Unlike conventional text generators that often rely on post-hoc filtering or model retraining for controllable generation, CDD directly imposes constraints into the discrete diffusion sampling process, resulting in a training-free and effective approach. Experiments in toxicity-controlled text generation, property-constrained molecule design, and instruction-constrained text completion demonstrate that CDD achieves zero constraint violations in a diverse array of tasks while preserving fluency, novelty, and coherence, while outperforming autoregressive and existing discrete diffusion approaches.




AFoundationModel forZero-shotLogicalQuery Reasoning

Neural Information Processing Systems

EPFO is the average of 9 query types with(,) operators, Negation is the average of 5 query types with the negation operator(). On average, a single ULTRAQUERY model outperforms the best baselines trained specifically on each dataset.






Structured Sparse Regression via Greedy Hard Thresholding

Neural Information Processing Systems

In this paper, we show that such NP-hard projections can not only be avoided by appealing to submodular optimization, but such methods come with strong theoretical guarantees even in the presence of poorly conditioned data (i.e. say when two features have